272 research outputs found

    Regression with Linear Factored Functions

    Full text link
    Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.Comment: Under review as conference paper at ECML/PKDD 201

    Predicting Fluid Intelligence of Children using T1-weighted MR Images and a StackNet

    Full text link
    In this work, we utilize T1-weighted MR images and StackNet to predict fluid intelligence in adolescents. Our framework includes feature extraction, feature normalization, feature denoising, feature selection, training a StackNet, and predicting fluid intelligence. The extracted feature is the distribution of different brain tissues in different brain parcellation regions. The proposed StackNet consists of three layers and 11 models. Each layer uses the predictions from all previous layers including the input layer. The proposed StackNet is tested on a public benchmark Adolescent Brain Cognitive Development Neurocognitive Prediction Challenge 2019 and achieves a mean squared error of 82.42 on the combined training and validation set with 10-fold cross-validation. In addition, the proposed StackNet also achieves a mean squared error of 94.25 on the testing data. The source code is available on GitHub.Comment: 8 pages, 2 figures, 3 tables, Accepted by MICCAI ABCD-NP Challenge 2019; Added ND

    Predictive gene lists for breast cancer prognosis: A topographic visualisation study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The controversy surrounding the non-uniqueness of predictive gene lists (PGL) of small selected subsets of genes from very large potential candidates as available in DNA microarray experiments is now widely acknowledged <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Many of these studies have focused on constructing discriminative semi-parametric models and as such are also subject to the issue of random correlations of sparse model selection in high dimensional spaces. In this work we outline a different approach based around an unsupervised patient-specific nonlinear topographic projection in predictive gene lists.</p> <p>Methods</p> <p>We construct nonlinear topographic projection maps based on inter-patient gene-list relative dissimilarities. The Neuroscale, the Stochastic Neighbor Embedding(SNE) and the Locally Linear Embedding(LLE) techniques have been used to construct two-dimensional projective visualisation plots of 70 dimensional PGLs per patient, classifiers are also constructed to identify the prognosis indicator of each patient using the resulting projections from those visualisation techniques and investigate whether <it>a-posteriori </it>two prognosis groups are separable on the evidence of the gene lists.</p> <p>A literature-proposed predictive gene list for breast cancer is benchmarked against a separate gene list using the above methods. Generalisation ability is investigated by using the mapping capability of Neuroscale to visualise the follow-up study, but based on the projections derived from the original dataset.</p> <p>Results</p> <p>The results indicate that small subsets of patient-specific PGLs have insufficient prognostic dissimilarity to permit a distinction between two prognosis patients. Uncertainty and diversity across multiple gene expressions prevents unambiguous or even confident patient grouping. Comparative projections across different PGLs provide similar results.</p> <p>Conclusion</p> <p>The random correlation effect to an arbitrary outcome induced by small subset selection from very high dimensional interrelated gene expression profiles leads to an outcome with associated uncertainty. This continuum and uncertainty precludes any attempts at constructing discriminative classifiers.</p> <p>However a patient's gene expression profile could possibly be used in treatment planning, based on knowledge of other patients' responses.</p> <p>We conclude that many of the patients involved in such medical studies are <it>intrinsically unclassifiable </it>on the basis of provided PGL evidence. This additional category of 'unclassifiable' should be accommodated within medical decision support systems if serious errors and unnecessary adjuvant therapy are to be avoided.</p

    Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Bayesian shrinkage technique has been applied to multiple quantitative trait loci (QTLs) mapping to estimate the genetic effects of QTLs on quantitative traits from a very large set of possible effects including the main and epistatic effects of QTLs. Although the recently developed empirical Bayes (EB) method significantly reduced computation comparing with the fully Bayesian approach, its speed and accuracy are limited by the fact that numerical optimization is required to estimate the variance components in the QTL model.</p> <p>Results</p> <p>We developed a fast empirical Bayesian LASSO (EBLASSO) method for multiple QTL mapping. The fact that the EBLASSO can estimate the variance components in a closed form along with other algorithmic techniques render the EBLASSO method more efficient and accurate. Comparing with the EB method, our simulation study demonstrated that the EBLASSO method could substantially improve the computational speed and detect more QTL effects without increasing the false positive rate. Particularly, the EBLASSO algorithm running on a personal computer could easily handle a linear QTL model with more than 100,000 variables in our simulation study. Real data analysis also demonstrated that the EBLASSO method detected more reasonable effects than the EB method. Comparing with the LASSO, our simulation showed that the current version of the EBLASSO implemented in Matlab had similar speed as the LASSO implemented in Fortran, and that the EBLASSO detected the same number of true effects as the LASSO but a much smaller number of false positive effects.</p> <p>Conclusions</p> <p>The EBLASSO method can handle a large number of effects possibly including both the main and epistatic QTL effects, environmental effects and the effects of gene-environment interactions. It will be a very useful tool for multiple QTL mapping.</p

    Γ-stochastic neighbour embedding for feed-forward data visualization

    Get PDF
    t-distributed Stochastic Neighbour Embedding (t-SNE) is one of the most popular nonlinear dimension reduction techniques used in multiple application domains. In this paper we propose a variation on the embedding neighbourhood distribution, resulting in Γ-SNE, which can construct a feed-forward mapping using an RBF network. We compare the visualizations generated by Γ-SNE with those of t-SNE and provide empirical evidence suggesting the network is capable of robust interpolation and automatic weight regularization

    Direct estimation of wall shear stress from aneurysmal morphology: A statistical approach

    Get PDF
    Computational fluid dynamics (CFD) is a valuable tool for studying vascular diseases, but requires long computational time. To alleviate this issue, we propose a statistical framework to predict the aneurysmal wall shear stress patterns directly from the aneurysm shape. A database of 38 complex intracranial aneurysm shapes is used to generate aneurysm morphologies and CFD simulations. The shapes and wall shear stresses are then converted to clouds of hybrid points containing both types of information. These are subsequently used to train a joint statistical model implementing a mixture of principal component analyzers. Given a new aneurysmal shape, the trained joint model is firstly collapsed to a shape only model and used to initialize the missing shear stress values. The estimated hybrid point set is further refined by projection to the joint model space. We demonstrate that our predicted patterns can achieve significant similarities to the CFD-based results

    ABCD Neurocognitive Prediction Challenge 2019: Predicting individual fluid intelligence scores from structural MRI using probabilistic segmentation and kernel ridge regression

    Get PDF
    We applied several regression and deep learning methods to predict fluid intelligence scores from T1-weighted MRI scans as part of the ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge) 2019. We used voxel intensities and probabilistic tissue-type labels derived from these as features to train the models. The best predictive performance (lowest mean-squared error) came from Kernel Ridge Regression (KRR; λ=10\lambda=10), which produced a mean-squared error of 69.7204 on the validation set and 92.1298 on the test set. This placed our group in the fifth position on the validation leader board and first place on the final (test) leader board.Comment: Winning entry in the ABCD Neurocognitive Prediction Challenge at MICCAI 2019. 7 pages plus references, 3 figures, 1 tabl

    Radiocarbon dating of methane and carbon dioxide evaded from a temperate peatland stream

    Get PDF
    Streams draining peatlands export large quantities of carbon in different chemical forms and are an important part of the carbon cycle. Radiocarbon (14C) analysis/dating provides unique information on the source and rate that carbon is cycled through ecosystems, as has recently been demonstrated at the air-water interface through analysis of carbon dioxide (CO2) lost from peatland streams by evasion (degassing). Peatland streams also have the potential to release large amounts of methane (CH4) and, though 14C analysis of CH4 emitted by ebullition (bubbling) has been previously reported, diffusive emissions have not. We describe methods that enable the 14C analysis of CH4 evaded from peatland streams. Using these methods, we investigated the 14C age and stable carbon isotope composition of both CH4 and CO2 evaded from a small peatland stream draining a temperate raised mire. Methane was aged between 1617-1987 years BP, and was much older than CO2 which had an age range of 303-521 years BP. Isotope mass balance modelling of the results indicated that the CO2 and CH4 evaded from the stream were derived from different source areas, with most evaded CO2 originating from younger layers located nearer the peat surface compared to CH4. The study demonstrates the insight that can be gained into peatland carbon cycling from a methodological development which enables dual isotope (14C and 13C) analysis of both CH4 and CO2 collected at the same time and in the same way

    Effect of fulvic acids on lead-induced oxidative stress to metal sensitive Vicia faba L. plant

    Get PDF
    Lead (Pb) is a ubiquitous environmental pollutant capable to induce various morphological, physiological, and biochemical functions in plants. Only few publications focus on the influence of Pb speciation both on its phytoavailability and phytotoxicity. Therefore, Pb toxicity (in terms of lipid peroxidation, hydrogen peroxide induction, and photosynthetic pigments contents) was studied in Vicia faba plants in relation with Pb uptake and speciation. V. faba seedlings were exposed to Pb supplied as Pb(NO3)2 or complexed by two fulvic acids (FAs), i.e. Suwannee River fulvic acid (SRFA) and Elliott Soil fulvic acid (ESFA), for 1, 12, and 24 h under controlled hydroponic conditions. For both FAs, Pb uptake and translocation by Vicia faba increased at low level (5 mg l−1), whereas decreased at high level of application (25 mg l−1). Despite the increased Pb uptake with FAs at low concentrations, there was no influence on the Pb toxicity to the plants. However, at high concentrations, FAs reduced Pb toxicity by reducing its uptake. These results highlighted the role of the dilution factor for FAs reactivity in relation with structure; SRFA was more effective than ESFA in reducing Pb uptake and alleviating Pb toxicity to V. faba due to comparatively strong binding affinity for the heavy metal

    R-Gada: a fast and flexible pipeline for copy number analysis in association studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.</p> <p>Results</p> <p>Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.</p> <p>Conclusions</p> <p>The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.</p
    • 

    corecore